Maintain replication connection between sync flows #1211

serprex · 2024-02-06T18:41:09Z

Currently we reconnect with each sync flow, requiring repeatedly starting replication. This can take an exceedingly long time for some workloads on some databases

Fix: use temporal session to share state between activities, use a single source connector throughout cdc flow, & move replication connection back into source connection

flow/activities/flowable.go

…anceled (#1214) Thinking in #1211 the workflow is exiting with an error which we're then ignoring & letting test pass but cleanup fail

serprex · 2024-02-07T21:31:13Z

Kevin pointed out a problem: in non parallel sync-normalize long normalize will have connection silent for too long & postgres can drop connection. Need to have keepalive logic. MaintainPull could work, but then we need to make sure to use synchronization between MaintainPull & StartFlow

flow/workflows/cdc_flow.go

iskakaushik · 2024-02-09T13:21:05Z

flow/activities/flowable.go

@@ -45,9 +45,10 @@ type SlotSnapshotSignal struct {
 type FlowableActivity struct {
 	CatalogPool *pgxpool.Pool
 	Alerter     *alerting.Alerter
+	CdcCacheRw  sync.RWMutex


Maybe have a replication connection manager struct that takes care of:

This map and locking.

Keeping track of the connection health and lifecycle.

Any other additional metadata pertaining to the replication connection.

Tracking connection health & lifecycle are part of MaintainPull which needs to exist either way to keep heartbeating session. Additional metadata belongs in the connector, where we avoid contention on CdcCacheRw

I could see moving replState/replConn out of the connector, then storing a struct { replState, replConn, connector } as the value of the hashmap. But for now putting it all in connector works about the same

flow/activities/flowable.go

Required updating connector interfaces. Bit annoying `ctx` everywhere, but that's ultimately the correct way. Was running into context complications in #1211 with connector being shared between activities Putting context in struct essentially makes that struct a context, but this is not the context we necessarily want. For more context, see https://zenhorace.dev/blog/context-control-go Some changes were made: 1. GetCatalog takes a context now instead of using `context.Background()` 2. eventhubs processBatch now takes context instead of using `context.Background()` 3. many instances of `Query`/`Exec` in snowflake/clickhouse converted to `QueryContext`/`ExecContext` 4. got rid of cancel context in ssh tunnel, context being passed in is sufficient Followup to #1238

flow/activities/flowable.go

iskakaushik · 2024-02-16T14:31:31Z

does using OriginalRunID over uuid have any unexpected consequences, if we were to reset the state of the workflow, would that re-create the new child workflow or would it complain that the run-id matches?

serprex · 2024-02-16T14:51:40Z

Recreation will have new RunID. OriginalRunID is because RunID changes with replays, OriginalRunID maintains determinism

https://pkg.go.dev/go.temporal.io/[email protected]/internal#WorkflowInfo

FirstRunID is what gives the RunID of the workflow consistent across ContinueAsNew

flow/activities/flowable.go

flow/connectors/postgres/postgres.go

Less side effects, less error handling, can correlate different workflows with same run id Also pull in some other cleanup from #1211

Only pass config & options to StartFlow, removing StartFlowInput In fact, while we're at it, rename StartFlow to SyncFlow, the name doesn't really make sense anymore, & it'll make less sense after #1211

flow/workflows/cdc_flow.go

serprex commented Feb 6, 2024

View reviewed changes

flow/activities/flowable.go Outdated Show resolved Hide resolved

serprex force-pushed the spiritus-mundi branch 3 times, most recently from c936149 to 772def2 Compare February 7, 2024 04:40

serprex mentioned this pull request Feb 7, 2024

e2e: require workflows exit with canceled error when expected to be canceled #1214

Merged

serprex added a commit that referenced this pull request Feb 7, 2024

e2e: require workflows exit with canceled error when expected to be c…

202ca2b

…anceled (#1214) Thinking in #1211 the workflow is exiting with an error which we're then ignoring & letting test pass but cleanup fail

serprex force-pushed the spiritus-mundi branch 3 times, most recently from c00b723 to e14b538 Compare February 7, 2024 17:01

serprex marked this pull request as ready for review February 7, 2024 18:54

serprex requested a review from iskakaushik February 7, 2024 18:54

serprex changed the title ~~Use a single source connector per cdc flow to avoid repeatedly reconnecting START REPLICATION~~ Maintain replication connection between sync flows Feb 7, 2024

serprex requested review from heavycrystal and Amogh-Bharadwaj February 7, 2024 19:11

serprex force-pushed the spiritus-mundi branch 3 times, most recently from a395c50 to 7cdb75b Compare February 8, 2024 16:05

iskakaushik reviewed Feb 9, 2024

View reviewed changes

flow/workflows/cdc_flow.go Outdated Show resolved Hide resolved

iskakaushik reviewed Feb 9, 2024

View reviewed changes

flow/activities/flowable.go Outdated Show resolved Hide resolved

serprex force-pushed the spiritus-mundi branch from c30bd20 to 8124726 Compare February 9, 2024 14:23

serprex mentioned this pull request Feb 9, 2024

add lint: containedctx #1240

Merged

serprex force-pushed the spiritus-mundi branch 6 times, most recently from 7558024 to ff908e5 Compare February 15, 2024 19:19

iskakaushik reviewed Feb 16, 2024

View reviewed changes

flow/activities/flowable.go Show resolved Hide resolved

serprex force-pushed the spiritus-mundi branch 3 times, most recently from 4ccc9dd to 7583072 Compare February 16, 2024 13:33

iskakaushik reviewed Feb 16, 2024

View reviewed changes

flow/activities/flowable.go Outdated Show resolved Hide resolved

iskakaushik reviewed Feb 16, 2024

View reviewed changes

flow/connectors/postgres/postgres.go Outdated Show resolved Hide resolved

heavycrystal reviewed Feb 16, 2024

View reviewed changes

flow/connectors/postgres/postgres.go Show resolved Hide resolved

serprex force-pushed the spiritus-mundi branch from e6857ea to 5bdece2 Compare February 16, 2024 19:53

serprex mentioned this pull request Feb 19, 2024

refactor fetching and pushing of last offset #1329

Merged

serprex force-pushed the spiritus-mundi branch 2 times, most recently from 92a9064 to 9010c35 Compare February 20, 2024 18:34

serprex added a commit that referenced this pull request Feb 20, 2024

Prefer OriginalRunID to generating UUID with side effect

7a5755e

Less side effects, less error handling, can correlate different workflows with same run id Also pull in some other cleanup from #1211

serprex mentioned this pull request Feb 20, 2024

Prefer OriginalRunID to generating UUID with side effect #1334

Merged

serprex added a commit that referenced this pull request Feb 20, 2024

Prefer OriginalRunID to generating UUID with side effect

a2a6b4e

Less side effects, less error handling, can correlate different workflows with same run id Also pull in some other cleanup from #1211

serprex added a commit that referenced this pull request Feb 20, 2024

Prefer OriginalRunID to generating UUID with side effect (#1334)

8b591fd

Less side effects, less error handling, can correlate different workflows with same run id Also pull in some other cleanup from #1211

serprex force-pushed the spiritus-mundi branch from 9010c35 to 265fa5c Compare February 20, 2024 19:32

serprex mentioned this pull request Feb 21, 2024

Remove redundancy between cdc state & sync flow options #1341

Merged

one-sync

0d21eb8

serprex force-pushed the spiritus-mundi branch from 265fa5c to 0d21eb8 Compare February 22, 2024 04:22

log replState change

2624ed6

iskakaushik reviewed Feb 22, 2024

View reviewed changes

flow/workflows/cdc_flow.go Outdated Show resolved Hide resolved

syncCtx: 2 week StartToCloseTimeout

e172cef

iskakaushik merged commit 8f4ad4e into main Feb 22, 2024
7 checks passed

iskakaushik deleted the spiritus-mundi branch February 22, 2024 13:54

serprex mentioned this pull request Feb 22, 2024

Explore ways to always read from the slot #690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintain replication connection between sync flows #1211

Maintain replication connection between sync flows #1211

serprex commented Feb 6, 2024 •

edited

Loading

serprex commented Feb 7, 2024

iskakaushik Feb 9, 2024

serprex Feb 9, 2024 •

edited

Loading

iskakaushik commented Feb 16, 2024

serprex commented Feb 16, 2024 •

edited

Loading

Maintain replication connection between sync flows #1211

Maintain replication connection between sync flows #1211

Conversation

serprex commented Feb 6, 2024 • edited Loading

serprex commented Feb 7, 2024

iskakaushik Feb 9, 2024

Choose a reason for hiding this comment

serprex Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

iskakaushik commented Feb 16, 2024

serprex commented Feb 16, 2024 • edited Loading

serprex commented Feb 6, 2024 •

edited

Loading

serprex Feb 9, 2024 •

edited

Loading

serprex commented Feb 16, 2024 •

edited

Loading